UNIVERSITY  OF  CALIFORNIA  PUBLICATIONS 

IN 

AGRICULTURAL    SCIENCES 

Vol.  4,  No.  7,  pp.  159-181,  12  text  figures  September  10,  1920 


A   NEW  AND   SIMPLIFIED    METHOD    FOR  THE 

STATISTICAL  INTERPRETATION  OF 

BIOMETRICAL   DATA1 

BY 

GEORGE  A.  LINHART 


SECTION  I 

The  derivation  of  the  Law  of  Probability  may  be  found  in  any  text 
on  the  subject.  Here  we  shall  assume  its  validity  and  use  it  to  obtain 
the  several  quantities  which  serve  as  criteria  in  statistical  calculations. 

In  the  fundamental  equation 

y  =  ke-h*x*  [1] 

there  are  two  characteristic  constants,  k  and  h,  whose  numerical  values 
must  be  known  for  a  given  set  of  data  before  we  can  proceed  with  any 
calculations.  A  simple  and  at  the  same  time  exact  method  of  obtaining 
the  numerical  values  for  those  constants  forms  the  subject  of  this  paper. 
Since  y  equals  k  when  x  equals  zero,  k  is  the  probability  of  an  error 
zero  and  will  therefore  be  defined  here  as  the  largest  number  of  measure- 
ments of  a  given  set  having  the  same  numerical  value;  while  y  will 
denote  any  number  of  measurements  whose  group  value  ranges  from 
zero  to  the  group  value  of  the  number  of  measurements  denoted  by 
k  or  y0.     Equation  (1)  then  becomes 

^-=e-^2  [2] 

which  by  means  of  logarithms  we  have  transformed  into  a  linear  equa- 

y 


tl0n'  Va 

Log  (2.303  Log  ^ )  =  2  Log  x  +  2  Log  h  [3] 


or 

Log  (Log  ^  )  =  2  Log  x  +  2  Log  h  -  0.3623  [4] 

Collecting  2  Log  h  and  —0.3623  into  one  constant,  we  have, 

Log(Log|°)  =  2Logx  +  K  [5] 

1  From  the  Division  of  Soil  Chemistry  and  Bacteriology,  College  of  Agriculture, 
University  of  California,  Berkeley. 


160  University  of  California  Publications  in  Agricultural  Sciences     [Vol.  4 

If  we  now  plot  Log  (Log  y0— Log  y)  as  ordinate  and  Log  x  as  abscissa 
with  a  slope  of  2  all  the  measurements  should  theoretically  fall  on  the 
straight  line,  provided  the  data  are  susceptible  to  statistical  interpreta- 
tion— that  is,  provided  they  are  truly  chance  data.  Practically,  how- 
ever, even  such  data  fall  on  either  side  of  the  straight  line.  Drawing 
now  the  "best"  straight  line  with  a  slope  of  2  through  these  points,  we 
can  then  read  off  the  values  on  the  line  as  accurately  as  we  choose, 
depending  upon  the  size  of  the  scale  of  plotting,  and  construct  a  "theo- 
retical" frequency  curve  for  comparison  with  the  experimental  frequency 
curve  obtained  in  the  usual  way;  that  is,  by  plotting  the  number  of 
experiments  in  groups  or  classes  against  the  measured  values.  Fre- 
quently 2/0  does  not  fall  directly  over  the  arithmetical  mean.  In  such 
a  case  the  theoretical  polygon  may  be  shifted  to  the  left  or  to  the  right, 
and  this  corresponds  to  the  parallel  shifting  of  the  straight  line  from 
which  the  values  for  the  construction  of  the  theoretical  frequency  poly- 
gon have  been  obtained.  Often  this  theoretical  polygon  reveals  the 
fact  that  the  arithmetical  mean  calculated  from  the  raw"  data  is  not 
in  all  cases  the  "best"  mean,  for,  as  it  frequently  happens,  one  or  two 
abnormal  values  will  vitiate  the  mean  considerably,  especially  if  the 
number  of  experiments  are  not  sufficiently  large.  We  must,  therefore, 
so  superpose  the  two  polygons  as  to  make  their  areas  approximately 
equivalent,  since,  as  will  be  shown  later,  the  areas  play  an  important 
part  in  the  calculation  of  the  probable  error.  A  concrete  example  will 
best  illustrate  the  method  of  procedure. 

In  a  recent  paper  by  Way  nick  and  Sharp  (1919)  are  given  the  nitrogen 
contents  of  a  hundred  samples  of  a  local  soil.  The  results  are  recorded 
to  0.001%,  based  upon  ten  gram  samples,  and  therefore  to  0. 1  mg.  In 
figure  I  these  one  hundred  results  are  mapped  in  groups  or  classes  0 . 1  mg. 
apart,  the  circles  indicating  the  number  of  determinations  falling  into 
each  class.2  Plotting  these  classes  vertically  to  a  scale  of  one-half  inch 
per  one  determination,  we  obtain  the  multimodal  curve  drawn  immed- 
iately above  the  circles.  Evidently  the  analyses  are  too  fine  as  com- 
pared with  the  variability  of  nitrogen  in  those  samples  of  soil.  The 
number  of  determinations  were  then  grouped  in  classes  0.5  mg.  apart, 
resulting  in  the  next  curve  above.  This  curve  bears  some  resemblance 
to  a  "  frequency"  curve,  but  is  still  unsatisfactory.  However,  such 
a  curve  is  quite  sufficient  for  the  construction  of  a  theoretical  frequency 
polygon  by  our  straight  line  method.  In  this  case  y0  would  equal  25 
and  could  be  made  to  fall  directly  over  the  arithmetical  mean,  10.0  mg. 

*When  the  circles  f;ill  on  a  line  dividing  two  classes,  then  if  the  number  of  circles 
is  even  they  are  equally  divided  between  the  two  classes;  if  odd,  the  extra  one  is  put 
into  that  class  which  helps  to  make  the  experimental  polygon  most  symmetrical. 


1920]     Lin-hart :  Method  for  Statistical  Interpretation  of  Biometrical  Lata     161 

Indeed,  if  we  attempt  to  plot  these  data  in  classes  very  much  farther 
apart  than  0.5  mg.,  say  2.0  mg.,  we  obtain  a  so-called  skew  curve,  and, 
finally,  we  may  obtain  a  line  sloping  in  one  direction  only  when  we  plot 
these  data  in  classes  2.5  mg.  apart.  It  is  evident,  therefore,  that  such 
skew  curves  are  meaningless.3  In  the  present  case  when  we  plot  the 
data  in  classes  1.0  mg.  apart  the  curve  "  skews"  but  slightly.  Here 
y0  falls  directly  over  the  arithmetical  mean,  and  the  one  hundred  deter- 
minations fall  into  four  classes.  With  these  four  points  on  the  curve, 
two  on  each  side  of  the  mean  and  approximately  equidistant  from  it 
we  may  construct  the  straight  line  as  shown  in  figures  VII,  VIII  and 
IX,  where  the  values  for  Log  (Log  y0  —  Log  y)  are  plotted  as  ordinates 
and  the  values  for  Log  x  as  abscissae,  x  denoting  the  residuals  on  either 
side  of  the  mean  without  regard  to  algebraic  sign.  It  should  be  noted 
that  in  drawing  the  "best"  straight  line  with  the  theoretical  slope  of  2 
through  such  points  proportionately  less  weight  must  be  given  to  points 
taken  from  the  experimental  polygon  near  the  base  than  to  those  taken 
from  the  upper  portion  of  the  curve.  A  little  practice  will  soon  enable 
one  to  judge  at  a  glance  which  points  are  most  significant.  Having  now 
obtained  the  "best"  straight  line,  we  may  calculate  any  number  of 
values  for  x  by  means  of  equation  (5),  namely: 

Log(Log|°)=2Logx  +  K, 

K  denoting  the  distance  on  the  Log  (Log  y0  —  Log  y)  axis,  or  ordinate, 
from  the  origin  to  the  point  of  its  intersection  by  the  "best"  straight 
line.  In  the  present  example  y0  equals  40  and  y  may  be  taken  anywhere 
from  one  to  thirty-nine,  but  for  the  construction  of  the  theoretical 
polygon  six  to  ten  values  for  y  will  suffice.     These  are  shown  in  table  I. 

Discussion  of  the  Figures 

Figure  I  has  been  fully  discussed.  Figure  II  is  but  another  example 
of  how  to  construct  a  theoretical  polygon  approximately  equivalent  in 
area  to  the  experimental  polygon.  An  interesting  set  of  data  is  that 
mapped  in  figure  III.  Here  the  total  nitrogen  in  each  sample  is  so 
small  that  a  few  samples  might  have  contained  no  measurable  amount  of 
nitrogen  at  all.  The  values  for  the  construction  of  these  two  figures, 
II  and  III,  were  taken  from  a  paper  by  Waynick  (1918). 

The  data  mapped  in  figure  IV  are  recorded  in  a  paper  by  Batchelor 
and  Reed  (1918).  Here  as  in  figure  III  the  theoretical  polygon  indicates 
that  among  the  one  thousand  orange  trees  about  three  might  have  borne 

3A  discussion  of  truly  abnormal  curves  and  their  susceptibility  to  statistical 
interpretation  will  be  given  in  another  paper.     See  also  section  II.  of  this  paper. 


162  University  of  California  Publications  in  Agricultural   Sciences     [Vol.  4 

no  fruit  at  all  had  they  been  left  wholly  to  chance.  In  fact  one  tree 
yielded  but  five  pounds  of  fruit,  which  is  practically  zero,  while  another 
yielded  341  pounds,  the  mean  of  all  the  thousand  trees  being  137.6 
pounds  of  fruit.  Two  more  interesting  sets  of  data  are  those  of  Wood 
(1910)  on  the  dry  weights  of  mangel  roots,  and  by  Collins  (1912)  on 
butter  fat.  These  results  are  mapped  in  figures  V  and  VI.  In  figures 
VII,  VIII  and  IX  are  shown  the  construction  of  the  straight  lines  from 
the  experimental  data  as  previously  described.  Finally,  in  figure  X 
are  mapped  the  results  of  bacterial  counts  taken  from  a  recent  article 
in  Science  (1920). 

Calculation  of  the  Index  of  Precision 

Turning  once  more  to  the  straight  line  plots  on  figures  VII,  VIII 
and  IX,  we  see  that  we  may  read  off  the  values  for  K  of  equation  (5) 
to  any  degree  of  accuracy,  depending  upon  the  size  of  the  scale  of  the 
plot.  On  the  above  plots,  20x20  inches,  the  values  for  K  can  be  read 
off  accurately  to  three  places  of  decimals,  which  is  quite  sufficient  for 
most  cases.  With  this  value  for  K  of  a  given  set  of  measurements  we 
can  calculate  the  value  for  h,  the  Index  of  Precision,  as  is  shown  in 
equation  (5)    where  K  was  put  in  place  of  2  Log  h  —  0. 3623 ;  hence, 

K  +  0.3623 
h  =  (10)         2  [6] 

Calculation  of  the  Probable  Error 

The  simplest  way  of  calculating  the  probable  error  is  to  take  from  a 
probability  integral  table  the  value  for  hx  corresponding  to  the  integral 
value  }/2-     This  value  for  hx  is  0.4769;  hence, 

_  K  +  0-3623 
:r  =  0.4769(10)  2  [7] 

We  might  of  course  draw  a  straight  line  through  every  "  class"  point 
parallel  to  the  "best"  straight  line  and  so  obtain  a  probable  error  for 
each  class  which,  when  meaned,  would  give  an  average  probable  error. 
However,  in  most  cases  the  probable  error  obtained  from  the  "best" 
straight  line  is  more  accurate. 

A  more  instructive  method  of  calculating  the  probable  error  is  to 
make;  a  tracing  of  the  theoretical  polygon,  which  is  constructed  from  the 
values  icad  off  on  the  straight  line  plot,  on  reasonably  uniform  tracing 
cloth  and  then  carefully  cutting  out  the  area  under  this  curve,  rolling 
it  up  and  finally  weighing  it  on  accurate  balances.  The  polygon  is  then 
unrolled  and  folded  along  the  mode  exactly  in  two  and  trimmed  along 
t  he  sides  parallel  to  the  fold  by  means  of  a  photographer's  print  trimmer 


1920]     Linhart:  Method  for  Statistical  Interpretation  of  Biometrical  Data     163 

until  it  weighs  exactly  one-half  of  the  original  weight.  Replacing  now 
this  trimmed  tracing  upon  the  original  theoretical  polygon,  we  may  read 
off  the  probable  error  on  the  base  of  the  polygon  at  the  limit  of  the 
tracing. 

Calculation  of  the  Probable  Error  of  the  Arithmetical  Mean 

By  means  of  the  Principle  of  Least  Squares  it  can  be  shown  that  the 

probable  error  of  the  arithmetical  mean,  x0,  is  equal  to  the  probable 

error  (obtained  from  h)  of  one  determination  divided  by  the  square 

root  of  the  number  of  determinations,  or, 

x         0.4769  ,  K  +  0-3623 

Xo=— =  ^—^=-..(10)  2 

y/n  vn 

TABLES  OF  RESULTS 

In  the  tables  below  are  given  in  the  first  columns  the  number  of 
determinations  falling  into  each  class,  while  in  the  last  columns  are 
given  the  values  calculated  by  means  of  the  straight  lines  for  the  con- 
struction of  the  theoretical  polygons.  The  headings  are  self-explana- 
tory. The  Roman  numerals  of  each  table  correspond  to  the  Roman 
numerals  on  the  figures  constructed  from  these  tables. 

Calculated  from 
I  K=  -0.670 

x  obs.  Log  x  Log  x  x 

-(-co  -+-  CO 

+0.4375  2.739 

2.0  +0.301  +0.3605  2.293 

+0.3130  2.056 


y 

Log^-° 

y 

Log  (Log  ^) 

0 

+   CO 

+    CO 

l 

1.602 

+0.205 

3 

1.125 

+0.051 

5 

0.903 

-0.044 

9 

0.648 

-0.188 

15 

0.426 

-0.371 

20 

0.301 

-0.521 

22 

0.260 

-0.586 

26 

0.187 

-0.728 

30 

0.125 

-0.903 

35 

0.058 

-1.237 

40 

0.000 

CO 

y 

**? 

Log  (Log  |°) 

0 

+  CO 

+   00 

1 

1.342 

+0.128 

2 

1.041 

+0.018 

3 

0.865 

-0.063 

6 

0.564 

-0.249 

8 

0.439 

-0.357 

10 

0.342 

-0.466 

15 

0.166 

-0.780 

19 

0.064 

-1.194 

2.0               +0.301  +0.2410  1.742 
+0.1495  1.411 


+0.0745  1.187 

1.0                  0.000  +0.0420  1.102 

1.0                  0.000  -0.0290  0.935 

-0.1165  0.765 

-0.2835  0.192 

-co  0.000 


Calculated  from 
11  K=  -0.350 

x  obs.  Log  x  Log  x  x 

+  co  -±-  CO 

1.7  +0.230  +0.239  1.734 
+0.184  1.528 

1.8  +0.255  +0.144  1.392 

+0.051  1.124 

-0.004  0.991 

0.8  -0.097  -0.058  0.875 

0.7  -0.155  -0.215  0.610 

0.2,  -0.699,  -0.422  0.378 
or    0.3          or  -0.523 

22            0.000           -co                   -co  0.000 


164 


University  of  California  Publications  in  Agricultural   Sciences     [Vol.  4 


III 

Calculated  from 

K  = 

+0.250 

y 

-? 

Log  (Log  |) 

x  obs. 

Log  x 

Log  x 

X 

0 

+  00 

+  00 

+  00 

±00 

l 

1.447 

+0.161 

1 .  15 

+6.061 

-0.045 

0.902 

2 

1.146 

+0.057 

-0.097 

0.801 

4 

0.845 

-0.073 

0.85 

-6.071 

-0.162 

0.690 

7 

0.602 

-0.220 

-0.235 

0.582 

8 

0.544 

-0.264 

0.55 

-0.260 

-0.257 

0.553 

10 

0.447 

-0.350 

-0.300 

0.510 

15 

0.271 

-0.567 

-0.409 

0.390 

IS 

0.192 

-0.717 

6.35 

-0.456 

-0.484 

0.328 

21 

0.125 

-0.903 

0.25 

-0.602 

-0.577 

0.265 

25 

0.049 

-1.310 

-0.779 

0.166 

28 

0.000 

00 

CO 

0.000 

Ilia 


Observed 

Calculated  from  K  =  I 

i.625 

y 

T        2/o 
Log- 

(Log  —  )2 

Log  m 

(Log  ^-f 

T       m 

L°s  — 

Log  m 

y 

w0 

m0 

Wo 

0 

+  00 

+  oo 

±co 

±00 

i 

1.591 

0.2828 

0.5318 

+0.3318 
-0.7318 

5 

0.892 

0.2304 

+0.28 
-0.68 

0.1585 

0.3981 

+0.1981 
-0.5981 

10 

0.591 

0.1051 

0.3241 

+0.1241 
-0.5241 

18 

0.336 

0.0576 

+0.04 
-0.44 

0.0597 

0 . 2443 

+0.0443 
-0.4443 

19 

0.312 

0.0576 

+0.04 
-0.44 

0.0555 

0.2356 

+0.0356 
-0.4356 

30 

0.114 

0.0203 

0.1425 

-0.0575 
-0.3425 

35 

0.047 

0.0084 

0.0917 

-0.1083 
-0.2917 

39 

0.000 

0.0000 

0.0000 

-0.2000 

IV 

Calculated  from 

K=- 

4.100 

y 

Wf 

Log  (Log  |°) 

x  obs. 

Log  x 

Log  x 

X 

0 

+  00 

+   0O 

+   00 

±00 

i 

2.182 

+0.339 

137.6, 
or  202.4 

+2.139, 
or  2.306 

+2.220 

165.8 

2 

1.881 

+0.274 

102.4 

2.211 

2.187 

153.8 

3 

1 .  705 

+0.232 

182.4 

2.261 

2.166 

146.6 

7 

1.337 

+0.126 

142.4 

2.154 

2.113 

129.7 

8 

1.279 

+0.107 

117.6 

2.070 

2.103 

126.9 

17 

0 .  952 

-0.021 

122.4 

2.088 

2.039 

109.5 

20 

0.881 

-0.055 

102.4 

2.010 

2.022 

105.3 

25 

0.784 

-0.106 

97.6 

1.989 

1.997 

99.3 

51 

0.474 

-0.324 

82.4 

1.916 

1.886 

76.9 

58 

0.419 

-0.378 

77.6 

1.890 

1.861 

72.6 

62 

0 .  390 

-0.409 

62.4 

1.795 

1.846 

70.1 

91 

0 .  223 

-0.652 

42.4 

1.627 

1.724 

53.0 

116 

0.118 

-0.928 

57.6 

1.760 

1.585 

38.5 

120 

0.103 

-0.987 

37.6 

1.575 

1.556 

36.0 

124 

0  089 

-1.051 

17.6 

1.246 

1.525 

33.5 

142 

0  030 

-1.523 

22.4 

1.350 

1.288 

19.4 

152 

0.000 

00 

—    CO 

0.0 

1920]     Linhart :  Method  for  Statistical  Interpretation  of  Biometrical  Data     165 


Calculated  from 

K  = 

-0.980 

y 

T            V0 

Log- 

Log  (Log  |) 

x  obs. 

Log  x 

Log  x 

X 

0 

+  00 

+  ro 

+  00 

±co 

l 

1.623 

+0.210 

4.0, 
or  5.0 

+6.602, 
or  +0.699 

+0.595 

3.94 

2 

1.322 

+0.121 

4.0 

+0.602 

+0.551 

3.55 

7 

0.778 

-0.109 

3.0 

+0.477 

+0.436 

2.73 

9 

0.669 

-0.175 

3.0 

+0.477 

+0 .  403 

2.53 

16 

0.419 

-0.378 

2.0 

+0.301 

+0.301 

2.00 

17 

0.393 

-0.406 

2.0 

+0.301 

+0.287 

1.94 

24 

0.243 

-0.614 

+0.183 

1.52 

32 

0.118 

-0.928 

id 

0.000 

+0 .  026 

1.06 

33 

0.104 

-0.983 

1.0 

0.000 

+0.002 

1.00 

38 

0.043 

-1.367 

-0.194 

0.64 

42 

0.000 

OD 

OO 

0.00 

VI 


Calculated  from 

K  = 

-0.700 

y 

Log| 

Log  (Log  |) 

x  obs. 

Log  x 

Log  x 

X 

0 

+  00 

+   00 

+  O0 

±00 

l 

2.415 

+0.383 

0.85 

-0  071 

-0.159 

0.694 

3 

1.938 

+0 .  287 

0.95 

-0.022 

4 

1.813 

+0.258 

0.65, 
or  0.75 

-0.187, 
or  -0.125 

5 

1.716 

+0.235 

0.85 

-0.071 

7 

1.570 

+0.196 

0.75 

-0.125 

8 

1.512 

+0.180 

0.65 

-0.187 

11 

1.374 

+0.138 

0.55 

-0.260 

-0.281 

0.524 

34 

0.884 

-0.054 

0.45 

-0.347 

39 

0.824 

-0.084 

0.55 

-0.260 

-0.392 

0.406 

45 

0.762 

-0.118 

0.45 

-0.347 

58 

0.652 

-0.186 

0.35 

-0.456 

63 

0.616 

-0.210 

0.35 

-0.456 

-0.455 

0.351 

97 

0.428 

-0.369 

0.25 

-0.602 

137 

0.278 

-0.556 

0.25 

-0.602 

-0.628 

0.236 

200 

0.114 

-0.943 

0.15 

-0.824 

-0.822 

0.151 

205 

0.103 

-0.987 

0.15 

-0.824 

241 

0.033 

-1.481 

0.05 

-1.301 

-1.091 

0.081 

260 

0.000 

OO 

—  oo 

0.000 

Calculated  from 

K=- 

-2.690 

y 

Log| 

Log  (Log  ^°) 

x  obs. 

Log  x 

Log  x 

X 

0 

+  00 

+   00 

+  00 

±00 

1 

0.903 

-0.044 

20, 
or  30. 

+  1.301, 

or  +1.477 

+  1.323 

21. 

3 

0.426 

-0.371 

+  1.160 

14. 

5 

0.204 

-0.690 

10. 

+  1.000 

+  1.000 

10. 

7 

0.058 

-1.237 

+0.727 

5. 

8 

0.000 

CO 

OO 

0. 

166 


University  of  California  Publications  in  Agricultural  Sciences     [Vol.  4 


Xa 

Observed 

Calculated  from  K  =  5.Q25 

y 

*? 

<L<>° 

Log  m 

<L</ 

Log  — 

ra0 

Log  m 

0 

+      °° 

+  00 

±00 

±00 

0.2 

1.477 

0.6029 

0.7765 

+  1.9765 
+0.4235 

0.5 

1.079 

0.4404 

0.6636 

+1.8636 
+0.5364 

1 

0.778 

0.3176 

0.5636 

+  1.7636 
+0.6364 

2 

0.477 

0.160 

+  1.60 
+0.80 

0.1947 

0.4413 

+  1.6413 
+0.7613 

5 

0.079 

0.040 

+  1.40 
+  1.00 

0.0322 

0.1794 

+  1.3794 
+  1.0206 

6 

0.000 

0.0000 

0.0000 

+  1.2000 

SUMMARY  TABLE 

In  the  table  following  the  Roman  numerals  in  the  first  column  refer 
either  to  the  tables  or  to  the  figures  themselves;  in  the  second,  third  and 
fourth  columns  are  given  the  means  the  probable  errors  and  the  pro- 
bable errors  of  the  means  calculated  by  the  new  method.  In  the  fifth, 
sixth  and  seventh  columns  are  given  the  means,  the  probable  errors 
and  the  probable  errors  of  the  means  taken  from  the  literature  listed 
at  the  end  of  this  article. 


By  the  New  Method 

Taken  from  the  Literature 

Probable 

Probable 

Probable 

error 

Probable         error 

Tables  Mean 

error 

of  the  mean 

Mean 

error      of  the  mean 

I           10.0 

0.68 

0.068 

10. 

0.60             0.060 

W.  &S. 

II           2.7 

0.47 

0.052 

2.7 

0.47             0.052 

W. 

III         0.65 

0.24 

0.026 

0.7 

0.24             0.026 

W. 

IV      137.6 

35.26 

1.12 

137.6 

37.0               1.2 

B.  &R. 

V         14.5 

0.97 

0.08 

14.5 

1.1               0.087 

Wood 

VI         3.05 

0.140 

0.004 

3.07 

0.1580         0.004 

Collins 

X         15.0 

7.0 

2.0 

Science 

oThis  value 

is  erroneous 

;.     Apparently  an  arithmetical  mistake. 

SECTION 

II 

One  of  the  fundamental  postulates  of  the  law  of  probability  of  errors 
is  that  positive  and  negative  errors  are  equally  frequent.  This  however 
is  not  generally  true.  It  is  true  for  example  in  military  statistics  where 
the  deviations  from  the  arithmetic  mean  are  small.  Thus  in  measuring 
the  heights  of  soldiers  the  maximum  deviation  from  the  mean  is  never 
more  than  about  one  foot,  while  the  height  of  the  shortest  soldier  is 
about  five  feet.  But  if  we  wish  to  ascertain  say  the  average  number  of 
children  per  family  in  the  United  States  the  frequency  curve  shows  that 
some  families  may  have  negative  children.  For  if  the  average  be  four 
children  per  family,  and  we  know  that  some  families  have  as  many  as 


1920]     Linhart:  Method  for  Statistical  Interpretation  of  Biometrical  Data     167 

five  times  that  number,  then,  according  to  the  above  postulate  the 
frequency  curve  must  include  the  count  zero  and  beyond.  This  is 
typical  of  a  great  many  cases  and  is  rather  the  rule  than  the  exception. 
Let  us  now  once  more  examine  some  of  the  figures  of  the  previous 
section  of  this  paper.  It  is  not  only  conceivable  but  very  likely  that 
some  of  the  samples  of  soil  of  which  the  nitrogen  contents  are  mapped  in 
figure  3  might  have  yielded  no  measurable  amount  of  nitrogen.  But  the 
frequency  curve  indicates  that  some  of  the  samples  might  have  contained 
a  quantity  less  than  zero.  The  same  is  true  of  the  yield  of  oranges 
mapped  on  figure  IV  and  of  the  bacterial  counts  mapped  on  figure  X. 
It  must  not  however  be  concluded  from  this  that  the  law  of  probability 
of  errors  does  not  apply  to  these  cases.  It  is  the  particular  form  of 
the  mathematical  expression  for  the  law  of  probability  of  errors  which 
does  not  apply.  We  have  therefore  sought  an  equation  of  such  form 
that  it  should  satisfy  the  postulates  of  the  law  of  probability  of  errors 
and  also  agree  with  experience.     This  equation  is 

^=e-h'(Log^  [9] 

where  m  denotes  the  numerical  value  of  any  measurement  and  m0  the 
value  of  the  geometric  mean.*  The  meanings  of  y,  y0  and  h  are  the 
same  as  those  of  the  same  quantities  in  equation  (2).  Equation  (9) 
states  that  it  is  as  likely  or  rather  as  unlikely  that  some  values  for  m 
be  zero  as  +  oo  ;  that  is,  in  either  case  y/y0  would  equal  zero.  When 
m  equals  m0,  y/yo  equals  1 ;  that  is,  the  maximum  probability  is  attained 
when  the  measured  values  do  not  deviate  from  the  value  of  the  mean. 
Transforming  equation  (9)  into  a  rectilinear  one,  as  has  been  done  with 
equation  (2),  we  obtain 

Log^°  =  2.303h2  (Log-)2  [10], 

y  wio 

Logf=K(Log^  [11]. 


Whence  h,  the  index  of  precision,  equals  J  K/2 .  303,  [12]. 

We  may  now  proceed  with  the  construction  of  the  experimental 
polygons  with  the  values  given  in  columns  1  and  4  of  tables  Ilia  and 
Xa,  then  find  the  "best"  values  for  K  from  the  straight  line  equation 
(11)  and  finally  construct  the  theoretical  curves  from  the  values  given 
in  columns  1  and  7  of  tables  Ilia  and  Xa.  The  curves  so  obtained  are 
shown  in  figures  Ilia  and  Xa.  From  these  curves  the  probable  errors 
may  be  calculated  as  described  in  section  I. 

*For  a  mathematical  discussion  see  Galton,  and  McAlister,  Proc.  Rov.  Soc.  Lond. 
29:365  (1879). 


168  University  of  California  Publications  in  Agricultural   Sciences     [Vol.4 


SUMMARY 

In  section  I  of  this  paper,  the  usual  mathematical  expression  for  the 
law  of  the  probability  of  errors  has  been  transformed  into  a  rectilinear 
form.  With  the  aid  of  this  equation,  the  statistical  criteria  for  various 
sets  of  data  may  be  very  accurately  calculated  without  previously  find- 
ing, squaring,  and  so  on,  of  the  individual  residuals,  and  thus  may  be 
saved  an  enormous  amount  of  time  and  labor. 

In  section  II,  it  is  shown  that  the  mathematical  expression  for  the 
law  of  the  probability  of  errors  generally  used  holds  only  where  the 
percentage  deviations  from  the  mean  are  small.  A  general  equation  is 
then  developed,  of  which  the  former  is  but  a  special  case.  For  when  the 
percentage  deviations  from  the  mean  are  small,  that  is,  when  m  is  less 
than  2  m0,  where  m  denotes  the  value  of  any  measurement  and  m0  the 
value  of  the  mean,  our  general  equation 

r     -hMLoes)2 

id 


may  be  expanded  in  series,  thus: 

y» _A _ i/2 (^-iY+i/3  fa  -A- . . . .7 

\m0      J  \m0     )  \m0      ) 


L0gy°=/i2 


Neglecting  all  terms  but  the  first  on  the  right-hand  side,  we  obtain, 

Log^=^(^^0Y 
&F  \    mo    J 

which  is  identical  with  the  ordinary  law  of  the  probability  of  errors 
generally  used,  and  most  often  misused,  for,  as  has  been  pointed  out, 
this  equation  holds  only  where  the  percentage  deviations  from  the  mean 
are  small. 

Transmitted  April  29,  1920. 


1920]     Linhart:  Method  for  Statistical  Interpretation  of  Biometrical  Data     169 


LITERATURE  CITED 

Batchellor,  L.D.,  and  Reed,  H.S. 

Relation  of  the  Variability  of  Yields  of  Fruit  Trees  to  the  Accuracy  of  Field 
Trials.     Jour.  Agric.  Research,  vol.  12,  p.  245.     1918. 

Collins,  S.  H. 

The  Application  of  the  Theory  of  Errors  to  Investigations  on  Milk.     Supplement 
7  to  the  Journal  of  the  Board  of  Agriculture  (London),  vol.  18,  pp.  48-55.  1911 

Johnstone,  James 

The  Probable  Error  of  a  Bacteriological  Analysis  Rept.  Lane.     See  Fish.  Lab., 
1919,  no.  27,  pp.  64-85— through  "  Science"  vol.  2,  pp.  89-91.     1920. 

Waynick,  D.  D. 

Variability  in  Soils  and  its  Significance  to  Past  and  Future  Soil  Investigations. 

I.  A  Statistical  Study  of  Nitrification  in  Soils.  Univ.  Calif.  Publ.  Agri.  Sci., 
vol.  3,  no.  9,  pp.  243-270.     1918. 

Waynick,  D.  D.  and  Sharp,  L.  T. 

Variability  in  Soils  and  its  Significance  to  Past  and  Future  Soil  Investigations. 

II.  Variations  in  Nitrogen  and  Carbon  in  Field  Soils  and  their  Relation  to 
the  Accuracy  of  Field  Trials.  Univ.  Calif.  Publ.  Agr.  Sci..  vol.  3,  pp.  243-270, 
2  text  figures.     1918. 

Wood,  T.  B. 

The  Interpretation  of  the  Results  of  Agricultural  Experiments.     Supplement  7 
to  the  Journal  of  the  Board  of  Agriculture  (London),  vol.  18,  pp.  15-37.     1911. 


170  University  of  California  Publications  in  Agricultural  Sciences     [Vol.  4 


1920]     Linhart:  Method  for  Statistical  Interpretation  of  Biometrical  Data     171 


;}]g||||j||;:i:.::::j;.:..,:.j;i;;::,:!;::  :,.=)■: :  •  •  | — •■  I       .  I  .  .■    |:/..!:v  |  ,':i  '|:::'  •■  |:       I.    . .  |  :: •  ;:j-;:-:^h- 


^SIhSJ'JRH'CF 

ill 


Mi! 


--!; 


•  4-  ,-: I  ■  *  ■   r  ■*  -  [-'   fi   1    ♦ :  J     ?  .;*:  ;    1.4     '     !     a     S     (j     I     ,    -?-     i-    :f    ,  -   |  ■■        \ :  '    :-   -■  l-->"~r 


172  University  of  California  Publications  in  Agricultural  Sciences     [Vol.  4 


1920]     Linhart :  Method  for  Statistical  Interpretation  of  Biometrical  Data     173 


____J5§L 


_ 
;:;;;:-: 


■'ufHhtnttuH7rh;r!WrrHrtti. 


174  University  of  California  Publications  in   Agricultural   Sciences     [Vol.  4 


^;==^|rr? 

Bf5lpi|| 

::_..:-j.— :5^:;!    ii  |||df 

„ . j ; _L.._,i_  .____._ 


'13: 


1920]     Linhart:  Method  for  Statistical  Interpretation  of  Biometrical  Data     175 


^fc„u,-^-4J 


-C    -  -:              :    :-     '    :  ::.:z::\V:l:.~~ 

1       §1 

176  University  of  California  Publications  in  Agricultural   Sciences     [Vol.  4 


■    ■■    -j:    ■       :   •)'    ■     ■..    j-::     .   •;{   ■•    -:-|    :•    ■■:;!:• 


1 


1 

..T-r 


PERCENT  Iof  Blttter  Fat  IN  Mil 


ililiililii 
....  .... 


W 


;Fi@vii 


lii;::!: 


m 


1920]     Linhart:  Method  for  Statistical  Interpretation  of  Biometrical  Data     177 


raw 


178  University  of  California  Publications  in  Agricultural  Sciences     [Vol.  4 


r--7" 


I 


A 


mm 


mm. 


FKfeVSI 


!■■■:      I-"-  -.i:  ::}: 


■M 


1920]     Linhart:  Method  for  Statistical  Interpretation  of  Biometrical  Data     179 


— 


1 


:'!:     '•   ''   !-'','--.'|      /    ■■   !': I- 


10 

I 


:::;-::-. :::-j:;:  :      13  :.. 


-  -ti-yt- 


- 


m 


ReiXi 


180  University  of  California  Publications  in  Agricultural  Sciences     [Vol.  4 


H 


!  :::::\-!             :  |     '^     '   !     '  "     ':• 

fNUJMBCRCT 

Ba^teriaj.  Counts 
ib        !        ao 

rekPlKte 

1 
S| 

..   n! 
•tf:-2-j  - 

:   ' 

FI<3X 


l:Mi:ii 


'K»  M       j<V-^"^J^:    ::j::::::':^:^f^J: 


1920]     Linhart:  Method  for  Statistical  Interpretation  of  Biometrical  Data     181 


:3      \':\ 


m 


-?U 


_<: ! : I : ; ■■:-:■  ■=■  ■:■:■■;  ■ 


